NSF PAR Search | NSF Public Access Repository

"My Very Subjective Human Interpretation": Domain Expert Perspectives on Navigating the Text Analysis Loop for Topic Models

https://doi.org/10.1145/3701201

Schofield, Alexandra; Wu, Siqi; Bayard_de_Volo, Theo; Kuze, Tatsuki; Gomez, Alfredo; Sultana, Sharifa (January 2025, Proceedings of the ACM on Human-Computer Interaction - GROUP)

Practitioners dealing with large text collections frequently use topic models such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) in their projects to explore trends. Despite twenty years of accrued advancement in natural language processing tools, these models are found to be slow and challenging to apply to text exploration projects. In our work, we engaged with practitioners (n=15) who use topic modeling to explore trends in large text collections to understand their project workflows and investigate which factors often slow down the processes and how they deal with such errors and interruptions in automated topic modeling. Our findings show that practitioners are required to diagnose and resolve context-specific problems with preparing data and models and need control for these steps, especially for data cleaning and parameter selection. Our major findings resonate with existing work across CSCW, computational social science, machine learning, data science, and digital humanities. They also leave us questioning whether automation is actually a useful goal for tools designed for topic models and text exploration.

Full Text Available

Search for: All records